The code of this work can be found here. This is just a summary that sustains the idea behind these project.

Data

We obtained the data from the webpage propiedades.com. We use the address of the apartment, bedrooms, bathrooms, half bathrooms, \(m^2\) of the apartment, mantainance fee, rent price, if the rent price is in MXN or USD. We then clean the raw data and remove duplicates. After that we use the ggmap package to get latitude and longitude of the apartments (and at the same time we validate the address). Then we homologate the currency of the rent prices (MXN) and finally we do missing data imputation over the number of bathrooms and the \(m^2\) because those values were missing in some of the apartments.

The head of the data frame we use is as follows:

library(dplyr)
library(knitr)
datos <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/datos_finales.rds')

kable(datos %>%
        dplyr::select(dir,rec,banios.m,const.m,est,precio,lon,lat)%>%
        sample_n(20))
dir rec banios.m const.m est precio lon lat
862 Rio Lerma 76 Centro Cuauhtémoc 06500 Distrito Federal 1 1 56.00000 0 18500 -99.16521 19.43011
358 Bosque De Jazmines Bosque de las Lomas 11700 Distrito Federal 1 1 80.00000 0 12500 -99.24378 19.40943
797 Av Paseo de la Reforma Juárez 06600 Distrito Federal 1 1 171.94409 0 20000 -99.19373 19.42571
110 Arquímedes Polanco V Sección 11560 Distrito Federal 3 3 166.00000 2 32000 -99.19196 19.42713
378 Sierra Gorda Lomas de Chapultepec III Sección 11000 Distrito Federal 3 3 350.00000 3 120000 -99.22237 19.42814
513 Concepción Beistegui Del Valle Centro 03100 Distrito Federal 2 2 190.00000 1 29000 -99.16890 19.39033
534 Dakota 412 Napoles 03810 Distrito Federal 1 1 50.00000 0 10000 -99.18011 19.38605
765 Rio Po Poniente Cuauhtémoc 06500 Distrito Federal 2 2 130.00000 0 18500 -99.16833 19.42979
579 Calle 18 San Pedro de los Pinos 03800 Distrito Federal 2 2 71.00000 2 12800 -99.18977 19.38747
741 Reforma 222 222 Centro Juárez 06600 Distrito Federal 2 2 90.00000 0 40000 -99.16204 19.42909
432 Avenida 96 San Pedro de los Pinos 03800 Distrito Federal 2 2 80.00000 1 15000 -99.18894 19.38570
185 Mariano Escobedo 1 Anzures 11590 Distrito Federal 3 4 250.00000 2 45000 -99.18206 19.45008
315 Lago Mask Los Manzanos 11460 Distrito Federal 2 2 63.82000 1 8500 -99.18458 19.44782
554 Luz Saviñón Del Valle Norte 03103 Distrito Federal 3 2 103.00000 1 15500 -99.16682 19.39260
759 Av Insurgentes 23 Centro San Rafael 06470 Distrito Federal 1 1 70.00000 0 12000 -99.15634 19.43881
878 Tokio 18 Juárez 06600 Distrito Federal 1 1 32.71142 0 10500 -99.17064 19.42420
174 Cofre De Perote 371 Lomas de Chapultepec I Sección 11000 Distrito Federal 3 2 220.00000 2 33000 -99.22588 19.42383
627 Martín Mendalde 986 Del Valle Sur 03104 Distrito Federal 2 2 110.00000 2 16500 -99.16785 19.38092
17 Monte Elbruz 1 Lomas de Chapultepec I Sección 11000 Distrito Federal 3 4 210.00000 2 38000 -99.20395 19.42962
411 Moras 321 Del Valle Sur 03104 Distrito Federal 3 2 200.00000 2 28000 -99.17322 19.37640

The plot of the apartmets over a map looks like these:

datos <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/datos_finales.rds')

latq <- quantile(datos$lat,c(0.38))
lonq <- quantile(datos$lon,c(0.28))

df <- get_map(location = c(lonq,latq),
              maptype = 'roadmap', zoom = 12)

ggmap(df, extent = 'panel') + 
  geom_point(aes(x=lon,y=lat),colour = 'blue', alpha = 1,
             size = 2, data = datos)

Exploratory data analysis

We want to look at the rent price dispersion over the addresses so we make the following graph:

datos <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/datos_finales.rds')

qmplot(lon, lat, data=datos, 
       size=precio.c, alpha=I(0.8), color=precio_log,
       geom=c('point'), source='google')

That graph shows that, in fact, are some places where the most expensive apartments are concentrated. The idea is apply spatial models in these data so we make a graph of the level curves of the prices in order to look that there is a zone where the price of the apartments is higher and when you get far of that zone the rent prices get lower.

fma <- precio_log ~ 1 + x + y + I(x*y) + I(x^2) + I(y^2) + 
  I(x^2*y) + I(x*y^2) + I(x^3) + I(y^3) 
mod.1 <- lm(formula = fma, data = datos)
summary(mod.1)
## 
## Call:
## lm(formula = fma, data = datos)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.54645 -0.33878 -0.03304  0.31756  1.90224 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.032e+01  2.988e-02 345.226  < 2e-16 ***
## x           -1.547e+01  1.700e+00  -9.098  < 2e-16 ***
## y            1.065e+01  1.599e+00   6.657 4.61e-11 ***
## I(x * y)     1.439e+02  4.407e+01   3.264 0.001134 ** 
## I(x^2)      -1.241e+02  2.833e+01  -4.382 1.30e-05 ***
## I(y^2)      -5.855e+02  3.800e+01 -15.407  < 2e-16 ***
## I(x^2 * y)  -4.187e+03  1.236e+03  -3.388 0.000732 ***
## I(x * y^2)   6.516e+03  1.432e+03   4.549 6.05e-06 ***
## I(x^3)       1.438e+03  8.178e+02   1.759 0.078943 .  
## I(y^3)      -1.151e+04  1.104e+03 -10.424  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5416 on 990 degrees of freedom
## Multiple R-squared:  0.3608, Adjusted R-squared:  0.355 
## F-statistic: 62.09 on 9 and 990 DF,  p-value: < 2.2e-16
surf <- akima::interp(x=datos$x, y=datos$y, z=datos$precio_log, 
                      xo = seq(min(datos$x), max(datos$x), length = 50), 
                      yo = seq(min(datos$y), max(datos$y), length = 50),
                      linear = FALSE, extrap = TRUE,
                      duplicate = "median")

surf.r <- cbind(expand.grid(surf$x, surf$y), c(surf$z))
colnames(surf.r) <- c("x", "y", "z")
surf.r$z <- predict(mod.1, surf.r)
bks <- as.numeric(quantile(surf.r$z))
surf.r$pr <- as.factor(cut(surf.r$z,breaks = bks))
ggplot(surf.r, aes(x = x, y = y, z = z)) + 
  geom_tile(aes(fill = pr)) + 
  scale_fill_brewer(palette = "Blues") + 
  stat_contour()

With the previous graph we confirm that the apartment rent price gets lower as we go far of the Polanco zone; with these graph we also found that the rent prices are similar between closer apartments so fitting a spatial model seems right.

Simple Kriging

datos_2 <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/datos_finales.rds')
datos_2$xx<-with(datos_2,x+media_lon)
datos_2$yy<-with(datos_2,y+media_lat)

We now proceed to the model. For these we used bathrooms, bedrooms, \(m^2\) and parking spaces.

datos <- datos_2 %>% 
  dplyr::select(rec,banios.m,const.m,est,precio_log,x,y)

# Kriging

# Estimacion de la variación a gran escala
# Polinomio de tercer orden
fma <- precio_log ~ 1 + x + y + rec + banios.m + const.m + est
mod.1 <- lm(formula = fma, data = datos)
summary(mod.1)
## 
## Call:
## lm(formula = fma, data = datos)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.81971 -0.23653  0.00519  0.26291  1.58607 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.265328   0.039863 232.430  < 2e-16 ***
## x           -2.934965   0.559000  -5.250 1.86e-07 ***
## y            3.242202   0.550129   5.894 5.17e-09 ***
## rec         -0.134960   0.020497  -6.585 7.38e-11 ***
## banios.m     0.210040   0.022063   9.520  < 2e-16 ***
## const.m      0.003588   0.000191  18.790  < 2e-16 ***
## est          0.057837   0.014082   4.107 4.33e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4093 on 993 degrees of freedom
## Multiple R-squared:  0.6338, Adjusted R-squared:  0.6316 
## F-statistic: 286.4 on 6 and 993 DF,  p-value: < 2.2e-16
# Estimacion de la variacion de pequeña escala
datos$mu_hat <- mod.1$fitted.values
datos <- mutate(datos, eta_hat = precio_log - mu_hat)

renta_datos <- data.frame(x = datos$x, y = datos$y, eta_hat = datos$eta_hat)
coordinates(renta_datos) = ~ x + y 
emp_variog <- variogram(eta_hat ~ 1, data = renta_datos, width = 0.001)

For the model we need to adjust the empirical semivariogram (we need it because it give us information about the average rate change of prices caused by the distance) in order to check spatial dependencies in the apartment rent prices. Alongside to the empirical semivariogram we plot the theoric semivariogram; for these we adjust an spheric semivariogram because we thought that the spatial dependence decreases approximately in linear form as the distance increases.

sph.variog <- function(sigma2, phi, tau2, dist){
  n <- length(dist)
  sph.variog.vec <- rep(0,n)
  for(i in 1:n){
    if(dist[i] < phi){
      sph.variog.vec[i] <- tau2 + (sigma2 * (((3 * dist[i]) / (2 * phi)) - 
                                               ((dist[i] ^ 3)/(2 * (phi ^ 3)))))
    }
    if(dist[i] >= phi){
      sph.variog.vec[i] <- tau2 + sigma2  
    }
  }
  return(sph.variog.vec)
}

sph_variog.w <- fit.variogram(emp_variog, 
                              vgm(psill= 0.05, model="Sph", range = 0.03, nugget = 0.1), 
                              fit.method = 7)
sigma2 <- sph_variog.w$psill[2]
phi <- sph_variog.w$range[2]
tau2 <- sph_variog.w$psill[1]
dist <- sph_variog.w$dist # vector de distancias
dat <- data.frame(x = seq(0, 0.06, 0.0001), y = seq(0, 0.12, 0.0002))
ggplot(dat, aes(x = x, y = y)) +  ylim(0, 0.2) +
  labs(title = expression("Semi-variograma esférico ajustado"),
       x = "distancia", y = "semivarianza") +
  stat_function(fun = sph.variog, args = list(sigma2 = sigma2, phi = phi, tau2 = tau2),
                colour = "green3") + 
  geom_point(data = emp_variog,  aes(x = dist, y = gamma))

The Kriging model fitted is:

# Estimacion de beta utilizando la matriz estimada de covarianzas
trend <- ~ 1 + datos$x + datos$y + datos$rec + 
  datos$banios.m + datos$const.m + datos$est
depas_geo <- as.geodata(obj = datos, coords.col = c(6, 7), data.col = 5)
depas_reml <- likfit(depas_geo, trend = trend, cov.model = "spherical", 
                     ini.cov.pars = c(sigma2, phi), nugget = tau2, 
                     fix.nugget = FALSE, lik.met = "REML")
saveRDS(depas_reml,'depas_reml.rds')

The \(\beta\)’s of the model are:

depas_reml <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/depas_reml.rds')
kable(depas_reml$beta)
intercept 8.9575888
covar1 -4.6482967
covar2 -3.9731080
covar3 -0.0017291
covar4 0.1233528
covar5 0.0027884
covar6 0.0569030
kable(mod.1$coeff)
(Intercept) 9.2653279
x -2.9349652
y 3.2422019
rec -0.1349601
banios.m 0.2100401
const.m 0.0035882
est 0.0578370
sigma2.reml <- depas_reml$sigmasq
phi.reml <- depas_reml$phi
tau2.reml <- depas_reml$tausq

#Kriging
kc.control <- krige.control(type.krige = "ok", 
                            trend.d = trend,
                            trend.l = trend, 
                            obj.model=depas_reml,
                            cov.model = "spherical",
                            cov.pars=c(sigma2.reml, phi.reml), 
                            nugget = tau2.reml)
loc <- matrix(c(datos$x,datos$y), nrow=nrow(datos), ncol=2)

#Prediccion
kc.sk.s0 <- krige.conv(depas_geo,
                       locations=loc,
                       krige=kc.control)
## krige.conv: model with mean defined by covariates provided by the user
## krige.conv: Kriging performed using global neighbourhood
datos$ks_precio<-kc.sk.s0$predict

And the predictions obtained look like these:

datos<- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/datos_prediccion.rds')

datos$pks_precio.c <- quantileCut(datos$ks_precio,5)

ggplot(datos, aes(x = x, y = y, colour = pks_precio.c)) +
  geom_point(size = 2.3) +
  scale_colour_brewer(palette = "Blues")

mean_lon <--99.18264
mean_lat <- 19.41507

aux <- datos %>% 
  dplyr::select(x, y, ks_precio,pks_precio.c) %>%
  mutate(lon = x + mean_lon, lat= y + mean_lat )

qmplot(lon, lat, data=aux,
       size=pks_precio.c, alpha=I(0.8), color=ks_precio,
       geom=c('point'), source='google')

Bayesian Kriging

We now proceed to the bayesian version of Kriging. For these we need priors for the sill, nugget and range. For the last two we use an Inverse-Gamma and for the sill we use a Uniform distribution. With the priors and the data we fit a model using 5000 iterations of the Monte Carlo method.

datos <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/datos_finales.rds') %>% 
  dplyr::select(rec,banios.m,const.m,est,precio_log,x,y)

N <- nrow(datos)
coords <- as.matrix(cbind(datos$x,datos$y),nrow=N,ncol=2)
Y <- datos$precio_log

Dist.mat <- rdist(coords)
max(Dist.mat)

# Valores iniciales
beta.ini <- rep(0,5)
sigma2.ini <- 1/rgamma(1,2,1)
tau2.ini <- 1/rgamma(1,2,0.5)
phi.ini <- runif(1,min=0.0008,max=0.06)

# Ajuste del modelo
fma <- Y ~ datos$x + datos$y + datos$rec + datos$banios.m +
  datos$const.m + datos$est
model.1 <- spLM(formula = fma, 
                coords=coords,starting=list("phi"=phi.ini,"sigma.sq"=sigma2.ini, "tau.sq"=tau2.ini),
                tuning=list("phi"=0.05, "sigma.sq"=0.1, "tau.sq"=0.05),
                priors=list("phi.Unif"=c(0.0008, 0.06), "sigma.sq.IG"=c(2, 1),
                            "tau.sq.IG"=c(2, 0.5)), cov.model="spherical",
                n.samples=5000, verbose=TRUE, n.report=100)
saveRDS(model.1,'model.rds')
model.1 <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/model.rds')

Algunas simulaciones de los parámetros de covarianza se ven así

kable(model.1$p.theta.samples[1:5,])
sigma.sq tau.sq phi
0.3401744 0.4894106 0.0112989
0.3401744 0.4894106 0.0112989
0.3155258 0.4291275 0.0120334
0.3155258 0.4291275 0.0120334
0.2130951 0.3585916 0.0093130

The marignal posterior distribution of the parameters look like these:

# Graficas y resumen de la distribucion posterior marginal 
# de los parametros de covarianza
plot(model.1$p.theta.samples)

print(summary(model.1$p.theta.samples[1001:5000,]))
##     sigma.sq          tau.sq             phi         
##  Min.   : 47.71   Min.   :0.05635   Min.   :0.01391  
##  1st Qu.: 98.64   1st Qu.:0.06979   1st Qu.:0.03513  
##  Median :121.51   Median :0.07325   Median :0.04810  
##  Mean   :136.60   Mean   :0.07334   Mean   :0.04449  
##  3rd Qu.:158.26   3rd Qu.:0.07648   3rd Qu.:0.05376  
##  Max.   :491.97   Max.   :0.09247   Max.   :0.05931
print(summary(model.1$p.theta.samples[1001:5000,]))
##     sigma.sq          tau.sq             phi         
##  Min.   : 47.71   Min.   :0.05635   Min.   :0.01391  
##  1st Qu.: 98.64   1st Qu.:0.06979   1st Qu.:0.03513  
##  Median :121.51   Median :0.07325   Median :0.04810  
##  Mean   :136.60   Mean   :0.07334   Mean   :0.04449  
##  3rd Qu.:158.26   3rd Qu.:0.07648   3rd Qu.:0.05376  
##  Max.   :491.97   Max.   :0.09247   Max.   :0.05931

The 95% probability intervals of the parameters look like these:

# Intervalo de 95% de probabilidad para parametros de covarianza
apply(model.1$p.theta.samples[1001:5000,],2,quantile,c(0.025,0.975))
##        sigma.sq     tau.sq        phi
## 2.5%   70.66285 0.06476423 0.02036274
## 97.5% 270.07822 0.08329890 0.05820462
# Recuperar coeficientes de estimacion con burning de 1000
model.1.recov <- spRecover(model.1, start=1000, verbose=TRUE)
saveRDS(model.1.recov, 'model.1.recov.rds')

The marginal posterior probability of the \(\beta\) coefficients looks like these:

model.1.recov <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/model.1.recov.rds')

# Graficas y resumenes de la distribucion marginal posterior
# de los coeficientes beta (parametros de regresion)
plot(model.1.recov$p.beta.recover.samples[,1:3])

plot(model.1.recov$p.beta.recover.samples[,4:5])

plot(model.1.recov$p.beta.recover.samples[,6:7])

summary(model.1.recov$p.beta.recover.samples)
## 
## Iterations = 1:4001
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 4001 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##                     Mean        SD  Naive SE Time-series SE
## (Intercept)     8.492824 1.170e+01 1.850e-01      2.006e-01
## datos$x        -4.278745 1.032e+01 1.632e-01      1.632e-01
## datos$y        -7.693852 1.102e+01 1.743e-01      1.783e-01
## datos$rec      -0.004121 1.741e-02 2.752e-04      3.025e-04
## datos$banios.m  0.124279 1.767e-02 2.793e-04      2.793e-04
## datos$const.m   0.002797 1.586e-04 2.507e-06      2.445e-06
## datos$est       0.056882 1.073e-02 1.697e-04      1.697e-04
## 
## 2. Quantiles for each variable:
## 
##                      2.5%        25%       50%       75%     97.5%
## (Intercept)    -14.507671   0.973703  8.635634 15.639209 31.374954
## datos$x        -24.758131 -10.940044 -4.387169  2.396214 15.894705
## datos$y        -30.459534 -15.069522 -7.477607 -0.285101 13.687851
## datos$rec       -0.038788  -0.015407 -0.003949  0.007808  0.029787
## datos$banios.m   0.089849   0.112016  0.124376  0.136385  0.158942
## datos$const.m    0.002486   0.002687  0.002798  0.002901  0.003106
## datos$est        0.035514   0.049826  0.057021  0.064126  0.077463
# eta prima 
eta.hat <- t(model.1.recov$p.w.recover.samples)

model.1.eta.summary <- summary(mcmc(t(model.1.recov$p.w.recover.samples)))$quantiles[,c(3,1,5)]
  model.1.eta.summary[1:5,]
##            50%      2.5%    97.5%
## var1 1.0915416 -21.59141 24.16618
## var2 1.1097061 -21.70512 24.20105
## var3 0.8599669 -22.06180 23.85060
## var4 1.1667923 -21.63231 24.11181
## var5 1.0549532 -21.80423 24.01024

We now proceed to the predictions:

# Hacemos la prediccion
# Matriz de covariables
predcov <- matrix(cbind(rep(1,N),datos$x,datos$y,datos$rec,
                        datos$banios.m,datos$const.m,datos$est),
                        nrow=N,ncol=7)
# Utiliza muestreo por composicion con MCMC a partir de 
# la distribucion predictiva a posteriori
# El burning es de 1000 muestras y se hacen predicciones cada
# dos muestras (parametro thin)
pred <- spPredict(model.1, 
                  pred.coords=coords, 
                  pred.covars=predcov,
                  start=1000,
                  thin=2)
saveRDS(pred, 'pred.rds')

We obtain the posterior mean of the predictions:

pred <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/pred.rds')
str(pred$p.y.predictive.samples)
##  num [1:1000, 1:2001] 10.37 10 9.68 9.62 10.02 ...
## Media posterior de las predicciones. 
post.pred.mean <- rowMeans(pred$p.y.predictive.samples)
(post.pred.mean[1:10])
##  [1] 10.367222  9.998798  9.680344  9.615805 10.021271  9.798127 10.126631
##  [8]  9.210340 10.043249 10.819778
# Predicciones de kriging
datos <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/datos_finales.rds')
datos_pred <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/datos_prediccion.rds')

# Regresion lineal
fma <- precio_log ~ 1 + x + y + rec + banios.m + const.m + est
mod.1 <- lm(formula = fma, data = datos)
summary(mod.1)
## 
## Call:
## lm(formula = fma, data = datos)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.81971 -0.23653  0.00519  0.26291  1.58607 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.265328   0.039863 232.430  < 2e-16 ***
## x           -2.934965   0.559000  -5.250 1.86e-07 ***
## y            3.242202   0.550129   5.894 5.17e-09 ***
## rec         -0.134960   0.020497  -6.585 7.38e-11 ***
## banios.m     0.210040   0.022063   9.520  < 2e-16 ***
## const.m      0.003588   0.000191  18.790  < 2e-16 ***
## est          0.057837   0.014082   4.107 4.33e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4093 on 993 degrees of freedom
## Multiple R-squared:  0.6338, Adjusted R-squared:  0.6316 
## F-statistic: 286.4 on 6 and 993 DF,  p-value: < 2.2e-16
# Predicciones de kriging bayesiano
N <- 8100
xo <- seq(min(datos$x), max(datos$x), length = 90)
yo <- seq(min(datos$y), max(datos$y), length = 90)
grid <- akima::interp(x=datos$x, y=datos$y, z=datos$precio_log, 
                      xo = xo, yo = yo,
                      linear = FALSE, extrap = TRUE,
                      duplicate = "median")
grid <- cbind(expand.grid(grid$x, grid$y), c(grid$z))
colnames(grid) <- c('x','y','z')
coords <- grid[,c('x','y')]
predcov <- matrix(cbind(rep(1,N),coords$x,coords$y,
                        sample(datos$rec,N,replace=T),
                        sample(datos$banios.m,N,replace=T),
                        rgamma(N,2.62,0.017),
                        sample(datos$est,N,replace=T)),
                  nrow=N,ncol=7)

model.1 <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/model.rds')
model.0 <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/depas_reml.rds')
sigma2.reml <- model.0$sigmasq
phi.reml <- model.0$phi
tau2.reml <- model.0$tausq
trend <- ~ 1 + datos$x + datos$y + datos$rec + 
  datos$banios.m + datos$const.m + datos$est
depas_geo <- as.geodata(obj = datos, coords.col = c(20, 21), data.col = 22)
trend_pred <- ~ 1 + coords$x + coords$y + predcov[,4] + 
  predcov[,5] + predcov[,6] + predcov[,7]
kc.control <- krige.control(type.krige = "ok", 
                            trend.d = trend,
                            trend.l = trend_pred, 
                            obj.model = model.0,
                            cov.model = "spherical",
                            cov.pars=c(sigma2.reml, phi.reml), 
                            nugget = tau2.reml)

kc.sk.s0 <- krige.conv(depas_geo,
                       locations=coords,
                       krige=kc.control)
## krige.conv: model with mean defined by covariates provided by the user
## krige.conv: Kriging performed using global neighbourhood
pred_ok_media <- kc.sk.s0$predict
pred_ok_var <- kc.sk.s0$krige.var

And with all that we can make the next plots:

pred_grid_bayesiano <- spPredict(model.1, 
                  pred.coords=coords, 
                  pred.covars=predcov,
                  start=500,
                  thin=10)

saveRDS(pred_grid_bayesiano,'pred_grid_bayesiano.rds')
N <- 8100
pred_grid_bayesiano <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/pred_grid_bayesiano.rds')
post_pred_bayesiano <- rowMeans(pred_grid_bayesiano$p.y.predictive.samples)
post_pred_95ci_bayes<-t(apply(pred_grid_bayesiano$p.y.predictive.samples,1,quantile,c(0.25,0.975)))
dat <- data.frame(id=1:N,media=post_pred_bayesiano,post_pred_95ci_bayes)
dat <- cbind(dat,predcov)
colnames(dat)<-c('id','media','lb_bayes','ub_bayes','int','x','y','rec','banios','const','est')
dat$lon <- dat$x + mean_lon
dat$lat <- dat$y + mean_lat
dat$ks_precio<-kc.sk.s0$predict

lonq <- median(dat$lon)
latq <- median(dat$lat)
df <- get_map(location = c(lonq,latq),
              maptype = 'roadmap', zoom = 13)
saveRDS(df,'df.rds')
df <- readRDS('/Users/Gers/Dropbox/Data\ Incubator/Project/data/df.rds')
ggmap(df, extent = 'panel') +
  geom_tile(data = dat,aes(x = lon, y = lat, z = media,
                fill = media, alpha=1)) +
  stat_contour(data=dat,aes(z = media),binwidth=10,size=0.5)+ 
  scale_fill_gradient2(low = "#0000FF", mid = "#FFFFFF", high ="#FF0000", 
                       midpoint = median(dat$media), space = "rgb", guide = "colourbar")

qmplot(lon, lat, data=dat,
       size=media, alpha=I(0.5), color=media,
       geom=c('point'), source='google') +
  scale_color_gradient(low='pink', high='red')

In both of the previous plots we notice that the Polanco and surrounding areas has a higher mean rent price and as you go away the rent price gets lower.

Comparison between Simple Kriging and Bayesian Kriging predictions

# Agregamos los intervalos de prediccion de kriging ordinario
dat$pred_ok <- pred_ok_media
dat$lb_ok <- pred_ok_media - 2*sqrt(pred_ok_var)
dat$ub_ok <- pred_ok_media + 2*sqrt(pred_ok_var)

ggplot(dat[sample(1:nrow(dat),100),]) + 
  geom_pointrange(aes(x = id, y=pred_ok, 
                      ymin=lb_ok, ymax=ub_ok), color='red')+
  geom_pointrange(aes(x = id, y=media, 
                      ymin=lb_bayes, ymax=ub_bayes), color='blue') 

In the previous graph we can see that the intervals of prediction between the models are similar but, in general, the probability confidence intervals of the bayesian prediction are smaller.